ggplot2

ggplot2 is the most elegant and aesthetically pleasing graphics framework available in R. The way you make plots in ggplot2 is very different from base graphics making the learning curve steep. That said, it’s totally worth it.

#Within each document, it is important to call the ggplot2 package so it knows you will be using functions/data/etc from inside that package
library(ggplot2)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ✔ purrr   0.3.3
## ── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

It’s essential that you properly organize your data into a data frame before you start with ggplot2. This why we spend the last week or two focus on learning ways to transform and wrangle data into different formats.

Once you have your data ready to go then you gradually add bits and pieces to it to create a plot. Plots are built up in layers, with the typically ordering being

  1. Plot the data
  2. Overlay a summary
  3. Add metadata and annotation

Basics

We will be working with the dataset mpg.

data(mpg)

The Setup

# ggplot ( dataframe, aes(x=xvariable, y=yvariable))
ggplot(mpg, aes(cty, hwy))

A blank ggplot is drawn. Even though the x and y are specified, there are no points or lines in it. This is because, ggplot doesn’t assume that you meant a scatterplot or a line chart to be drawn. I have only told ggplot what dataset to use and what columns should be used for X and Y axis. I haven’t explicitly asked it to draw any points.

Plotting Points

The basics:

ggplot(mpg, aes(cty, hwy)) +
  geom_point()

To customize colors, plotting characters, size:

ggplot(mpg, aes(cty, hwy)) +
  geom_point(col="steelblue", pch=1, size=2)

A list of possible pch values

A list of possible pch values

Adding Layers

Let’s make a scatterplot on top of the blank ggplot by adding points using a geom layer called geom_point.

ggplot(mpg, aes(cty, hwy)) +
  geom_point(col="steelblue", size=2) +
  labs(title="City MPG vs. Highway MPG", 
         subtitle="this is a subtitle", 
         x="City MPG", 
         y="Highway MPG", 
         caption="source: mpg dataset") 

Colors by Group

gg <- ggplot(mpg, aes(cty, hwy)) +
  geom_point(aes(col=class), size=2) +
  labs(title="City MPG vs. Highway MPG", 
         subtitle="this is a subtitle", 
         x="City MPG", 
         y="Highway MPG", 
         caption="source: mpg dataset") 

gg

As an added benefit, the legend is added automatically. If needed, it can be removed by setting the legend.position to None from within a theme() function.

gg + theme(legend.position="None")

Also, You can change the color palette entirely.

gg + scale_colour_brewer(palette="Spectral")

More of such palettes can be found in the RColorBrewer package

RColorBrewer palettes

RColorBrewer palettes

You can also build your own color palettes using the built in colors in R or by using HEX codes (ie. #RRGGBB )

R Built In Colors

R Built In Colors

We will spend more time later in the course discussing best practices for color choices, but for now keep in mind:

  • use intuitive/meaningful colors, if possible
  • make to use colors with high contrast (exception: avoid red and green if possible)

Adding Text, Labels, and Annotation

ggplot(mpg, aes(cty, hwy, label=model)) +
  geom_point(aes(col=class), size=2) +
  labs(title="City MPG vs. Highway MPG", 
         subtitle="this is a subtitle", 
         x="City MPG", 
         y="Highway MPG", 
         caption="source: mpg dataset") +
  geom_text(size=2)

Using Themes

Themes can be a useful way to “style” an entire graph at once. Common themes are theme_classic(), theme_dark(), theme_bw(), and theme_grey().

gg + theme_grey()

library(ggthemes) contains lots of additional themes including theme_wsj() (Wall Street Journal), theme_economist() (The Economist), theme_fivethirtyeight() (Five Thirty Eight), etc.

#make sure you have run install.packages("ggthemes") on your computer at some point
library(ggthemes)

gg + theme_fivethirtyeight()

More Plot Types

Histograms

Histograms should be used for one continuous variable.

#hist(mpg$cty)

ggplot(mpg, aes(cty))  +
  geom_histogram(binwidth=2)

Boxplots

Boxplots should be used for one continuous variable. Side-by-Side Boxplots can be good for comparing a numerical variable across many different levels (categories).

mpg %>% 
  mutate(class = reorder(class, cty, FUN=median)) %>% 
  ggplot(aes(x=class, y=cty)) +
  geom_boxplot(fill="steelblue", outlier.size = 0)

Barplots

Barplots should be used for one or two categorical variables.

mpg %>% 
  mutate(unit = 1) %>% 
  mutate(manufacturer = reorder(manufacturer, unit, FUN=sum)) %>% 
  ggplot(aes(x=manufacturer)) +
  geom_bar() +
  labs(title="Barplot of One Categorical Variable", subtitle="Manufacturers of Cars") +
theme(axis.text.x=element_text(angle=90))

  #coord_flip()
mpg %>% 
  mutate(unit = 1) %>% 
  mutate(manufacturer = reorder(manufacturer, unit, FUN=sum)) %>% 
  ggplot(aes(x=manufacturer)) +
  geom_bar() +
  geom_text(stat='count', aes(label=..count..), vjust=-1)

  labs(title="Barplot of One Categorical Variable", subtitle="Manufacturers of Cars") +
  theme(axis.text.x=element_text(angle=90))
## NULL
mpg %>% 
  ggplot(aes(manufacturer)) +
  geom_bar(aes(fill=class)) +
  coord_flip() +
  scale_fill_brewer(palette="Spectral")

The are so many different ways to modify the themes - the legend, where the axis ticks go, the background colors, the position of text, the font, etc. You can get a the full scope of all the options by typing ?theme into the console. scale_color_brewer() is for points, lines, etc. scale_fill_brewer() is for barplots, boxplots

scale_color_manual() is for points, lines, etc. scale_fill_manual() is for barplots, boxplots

Time Series

gapminder <- read.csv("https://ebmwhite.github.io/MATH0216/activities/gapminder.csv")
gapminder %>% 
  ggplot(aes(x=year, y=lifeExp, group=country)) +
  geom_line()

gapminder %>% 
  group_by(continent, year) %>% 
  summarize(lifeExp = mean(lifeExp)) %>% 
  ggplot(aes(x=year, y=lifeExp,color=continent)) +
  geom_line()

Quick Reference

Here are some resources that may be useful quick reference guides for ggplot2: